A stream-of-consciousness analysis and exploration of the data.

Load the data set into R

Sample a 10,000 observations and look through the summary and data types of them

## 'data.frame':    10000 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 107874 1579 91587 23849 48589 27645 20249 97958 19565 21896 ...
##  $ ListingNumber                      : int  601187 1024213 761118 1083410 22540 151187 305309 540132 64324 1033882 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 57080 97897 73322 100624 1746 11913 21216 46427 4956 99085 ...
##  $ CreditGrade                        : Factor w/ 9 levels "","A","AA","B",..: 1 1 1 1 7 5 3 1 6 1 ...
##  $ Term                               : int  36 36 60 36 36 36 36 36 36 60 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 4 4 4 4 3 5 3 4 2 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1 1 1 1 427 1303 1184 1 604 1 ...
##  $ BorrowerAPR                        : num  0.227 0.261 0.223 0.167 0.258 ...
##  $ BorrowerRate                       : num  0.19 0.224 0.198 0.131 0.25 ...
##  $ LenderYield                        : num  0.18 0.213 0.188 0.121 0.245 ...
##  $ EstimatedEffectiveYield            : num  0.176 0.196 0.176 0.116 NA ...
##  $ EstimatedLoss                      : num  0.065 0.1075 0.0674 0.0449 NA ...
##  $ EstimatedReturn                    : num  0.111 0.0882 0.1091 0.0708 NA ...
##  $ ProsperRating..numeric.            : int  4 3 4 5 NA NA NA 3 NA 4 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 5 6 5 4 1 1 1 6 1 5 ...
##  $ ProsperScore                       : num  5 3 5 6 NA NA NA 7 NA 5 ...
##  $ ListingCategory..numeric.          : int  1 1 1 3 0 0 3 7 0 1 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 28 12 33 18 1 1 17 11 12 6 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 37 37 32 61 37 52 8 2 52 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 2 2 2 2 4 8 3 2 4 2 ...
##  $ EmploymentStatusDuration           : int  61 199 36 302 NA 31 13 161 NA 98 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 1 1 2 2 1 1 2 1 2 1 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 1 1 1 1 2 2 1 1 2 1 ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 1 1 159 161 1 1 568 1 ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 57050 97955 73265 100713 1736 11944 21308 46357 5132 99121 ...
##  $ CreditScoreRangeLower              : int  680 720 720 720 560 660 800 660 600 660 ...
##  $ CreditScoreRangeUpper              : int  699 739 739 739 579 679 819 679 619 679 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 9713 5790 2227 7881 3449 4455 6174 9391 6772 7521 ...
##  $ CurrentCreditLines                 : int  16 10 11 9 NA 6 8 6 NA 12 ...
##  $ OpenCreditLines                    : int  16 10 10 8 NA 4 5 6 NA 12 ...
##  $ TotalCreditLinespast7years         : int  23 22 20 18 30 22 21 12 41 25 ...
##  $ OpenRevolvingAccounts              : int  5 6 7 6 2 2 2 4 6 8 ...
##  $ OpenRevolvingMonthlyPayment        : num  345 472 647 533 45 84 5 98 56 477 ...
##  $ InquiriesLast6Months               : int  0 2 0 1 3 7 1 0 6 2 ...
##  $ TotalInquiries                     : num  2 3 3 1 12 14 2 4 19 4 ...
##  $ CurrentDelinquencies               : int  0 0 0 0 7 1 0 2 5 0 ...
##  $ AmountDelinquent                   : num  0 0 0 0 NA 47 0 179 NA 0 ...
##  $ DelinquenciesLast7Years            : int  0 0 0 0 28 1 3 4 49 0 ...
##  $ PublicRecordsLast10Years           : int  0 0 0 0 2 0 0 0 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 0 0 NA 0 0 0 NA 0 ...
##  $ RevolvingCreditBalance             : num  7622 16308 23741 17427 NA ...
##  $ BankcardUtilization                : num  0.8 0.62 0.9 0.75 NA 0.84 0 0.09 NA 0.9 ...
##  $ AvailableBankcardCredit            : num  1691 9892 2345 3810 NA ...
##  $ TotalTrades                        : num  19 19 20 18 NA 19 18 11 NA 20 ...
##  $ TradesNeverDelinquent..percentage. : num  1 1 1 1 NA 0.8 0.94 0.66 NA 0.9 ...
##  $ TradesOpenedLast6Months            : num  0 0 1 0 NA 1 2 0 NA 1 ...
##  $ DebtToIncomeRatio                  : num  0.43 0.54 0.25 0.31 0.26 0.27 0.1 0.22 0.13 0.18 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 6 3 5 7 5 5 4 7 3 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3333 6667 10406 4583 3167 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 54620 7455 53342 95577 84689 31396 20225 104350 111631 30081 ...
##  $ TotalProsperLoans                  : int  1 NA NA NA NA NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  21 NA NA NA NA NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  21 NA NA NA NA NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  0 NA NA NA NA NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  0 NA NA NA NA NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  1000 NA NA NA NA NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  0 NA NA NA NA NA NA NA NA NA ...
##  $ ScorexChangeAtTimeOfListing        : int  124 NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 294 0 0 2286 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA 31 NA NA 15 NA ...
##  $ LoanMonthsSinceOrigination         : int  21 3 10 3 92 81 71 28 87 3 ...
##  $ LoanNumber                         : int  68892 119639 89737 123208 1788 16437 29853 56394 5215 120923 ...
##  $ LoanOriginalAmount                 : int  9000 10000 25000 10400 2550 13000 10000 2000 3500 25000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 1449 1807 1660 1819 130 373 577 1299 233 1812 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 15 33 16 33 17 10 11 31 26 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 42919 85223 62619 71297 7382 46727 55820 80117 13351 5608 ...
##  $ MonthlyLoanPayment                 : num  330 384 660 351 101 ...
##  $ LP_CustomerPayments                : num  6600 1151 6601 702 3226 ...
##  $ LP_CustomerPrincipalPayments       : num  4366 610 2658 473 2557 ...
##  $ LP_InterestandFees                 : num  2234 541 3943 230 670 ...
##  $ LP_ServiceFees                     : num  -117.5 -24.2 -198.7 -17.5 -13.1 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 400 0 0 0 0 ...
##  $ Investors                          : int  164 16 2 153 18 328 203 13 48 134 ...
##                    ListingKey   ListingNumber    
##  2E4135942338729616CBB84:   2   Min.   :     34  
##  32943590099161153292459:   2   1st Qu.: 402053  
##  43A53602342416699E07666:   2   Median : 596793  
##  D30235991099246051BBA60:   2   Mean   : 624707  
##  00033425227988088FA6752:   1   3rd Qu.: 883750  
##  0006357747559732389619D:   1   Max.   :1250010  
##  (Other)                :9990                    
##                     ListingCreationDate  CreditGrade        Term      
##  2013-09-21 11:35:49.803000000:   2            :7464   Min.   :12.00  
##  2013-11-01 16:35:05.490000000:   2     C      : 513   1st Qu.:36.00  
##  2014-01-05 13:42:23.993000000:   2     D      : 403   Median :36.00  
##  2014-02-02 21:53:24.147000000:   2     B      : 387   Mean   :40.81  
##  2005-11-28 16:16:35.077000000:   1     AA     : 330   3rd Qu.:36.00  
##  2005-11-29 13:29:16.810000000:   1     HR     : 319   Max.   :60.00  
##  (Other)                      :9990     (Other): 584                  
##                  LoanStatus                 ClosedDate    BorrowerAPR     
##  Current              :4869                      :5067   Min.   :0.02659  
##  Completed            :3420   2014-01-14 00:00:00:  15   1st Qu.:0.15549  
##  Chargedoff           :1112   2013-03-26 00:00:00:  12   Median :0.21025  
##  Defaulted            : 399   2014-03-04 00:00:00:  12   Mean   :0.21894  
##  Past Due (1-15 days) :  79   2012-11-06 00:00:00:  11   3rd Qu.:0.28386  
##  Past Due (31-60 days):  31   2013-06-26 00:00:00:  11   Max.   :0.45857  
##  (Other)              :  90   (Other)            :4872   NA's   :3        
##   BorrowerRate     LenderYield     EstimatedEffectiveYield
##  Min.   :0.0100   Min.   :0.0000   Min.   :-0.0795        
##  1st Qu.:0.1334   1st Qu.:0.1234   1st Qu.: 0.1157        
##  Median :0.1840   Median :0.1740   Median : 0.1616        
##  Mean   :0.1928   Mean   :0.1828   Mean   : 0.1692        
##  3rd Qu.:0.2500   3rd Qu.:0.2400   3rd Qu.: 0.2254        
##  Max.   :0.4500   Max.   :0.4325   Max.   : 0.3199        
##                                    NA's   :2548           
##  EstimatedLoss    EstimatedReturn   ProsperRating..numeric.
##  Min.   :0.0049   Min.   :-0.0795   Min.   :1.000          
##  1st Qu.:0.0424   1st Qu.: 0.0741   1st Qu.:3.000          
##  Median :0.0712   Median : 0.0927   Median :4.000          
##  Mean   :0.0800   Mean   : 0.0966   Mean   :4.077          
##  3rd Qu.:0.1120   3rd Qu.: 0.1174   3rd Qu.:5.000          
##  Max.   :0.3660   Max.   : 0.2837   Max.   :7.000          
##  NA's   :2548     NA's   :2548      NA's   :2548           
##  ProsperRating..Alpha.  ProsperScore    ListingCategory..numeric.
##         :2548          Min.   : 1.000   Min.   : 0.000           
##  C      :1619          1st Qu.: 4.000   1st Qu.: 1.000           
##  B      :1338          Median : 6.000   Median : 1.000           
##  A      :1302          Mean   : 6.003   Mean   : 2.761           
##  D      :1249          3rd Qu.: 8.000   3rd Qu.: 3.000           
##  E      : 860          Max.   :11.000   Max.   :20.000           
##  (Other):1084          NA's   :2548                              
##  BorrowerState                     Occupation        EmploymentStatus
##  CA     :1284   Other                   :2433   Employed     :5883   
##  TX     : 644   Professional            :1187   Full-time    :2303   
##  FL     : 619   Computer Programmer     : 364   Self-employed: 539   
##  NY     : 532   Executive               : 356   Not available: 466   
##  IL     : 525                           : 325   Other        : 328   
##         : 497   Administrative Assistant: 317                : 210   
##  (Other):5899   (Other)                 :5018   (Other)      : 271   
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           False:4916          False:8922      
##  1st Qu.: 26.00           True :5084          True :1078      
##  Median : 67.00                                               
##  Mean   : 96.09                                               
##  3rd Qu.:136.00                                               
##  Max.   :648.00                                               
##  NA's   :679                                                  
##                     GroupKey                         DateCreditPulled
##                         :8858   2013-11-01 16:35:12          :   2   
##  783C3371218786870A73D20:  97   2013-11-02 08:55:31          :   2   
##  3D4D3366260257624AB272D:  76   2014-02-02 21:53:27          :   2   
##  6A3B336601725506917317E:  59   2014-02-18 11:22:58          :   2   
##  FEF83377364176536637E50:  52   2005-11-28 10:19:33.010000000:   1   
##  CD0E3364909037313F32874:  34   2005-11-28 16:16:35.077000000:   1   
##  (Other)                : 824   (Other)                      :9990   
##  CreditScoreRangeLower CreditScoreRangeUpper
##  Min.   :  0.0         Min.   : 19.0        
##  1st Qu.:660.0         1st Qu.:679.0        
##  Median :680.0         Median :699.0        
##  Mean   :685.6         Mean   :704.6        
##  3rd Qu.:720.0         3rd Qu.:739.0        
##  Max.   :880.0         Max.   :899.0        
##  NA's   :60            NA's   :60           
##         FirstRecordedCreditLine CurrentCreditLines OpenCreditLines 
##                     :  66       Min.   : 0.00      Min.   : 0.000  
##  1994-11-01 00:00:00:  22       1st Qu.: 6.75      1st Qu.: 6.000  
##  1996-10-01 00:00:00:  22       Median :10.00      Median : 9.000  
##  1990-04-01 00:00:00:  20       Mean   :10.35      Mean   : 9.296  
##  1990-05-01 00:00:00:  19       3rd Qu.:13.00      3rd Qu.:12.000  
##  1996-11-01 00:00:00:  19       Max.   :52.00      Max.   :51.000  
##  (Other)            :9832       NA's   :676        NA's   :676     
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  2.00             Min.   : 0.000       
##  1st Qu.: 17.00             1st Qu.: 4.000       
##  Median : 25.00             Median : 6.000       
##  Mean   : 26.97             Mean   : 7.012       
##  3rd Qu.: 35.00             3rd Qu.: 9.000       
##  Max.   :127.00             Max.   :51.000       
##  NA's   :66                                      
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries   
##  Min.   :    0.0             Min.   : 0.000       Min.   :  0.000  
##  1st Qu.:  113.0             1st Qu.: 0.000       1st Qu.:  2.000  
##  Median :  268.0             Median : 1.000       Median :  4.000  
##  Mean   :  396.8             Mean   : 1.397       Mean   :  5.538  
##  3rd Qu.:  527.0             3rd Qu.: 2.000       3rd Qu.:  7.000  
##  Max.   :10977.0             Max.   :52.000       Max.   :106.000  
##                              NA's   :66           NA's   :113      
##  CurrentDelinquencies AmountDelinquent   DelinquenciesLast7Years
##  Min.   : 0.0000      Min.   :     0.0   Min.   : 0.000         
##  1st Qu.: 0.0000      1st Qu.:     0.0   1st Qu.: 0.000         
##  Median : 0.0000      Median :     0.0   Median : 0.000         
##  Mean   : 0.5791      Mean   :   934.1   Mean   : 4.162         
##  3rd Qu.: 0.0000      3rd Qu.:     0.0   3rd Qu.: 3.000         
##  Max.   :31.0000      Max.   :255963.0   Max.   :99.000         
##  NA's   :66           NA's   :677        NA's   :93             
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   : 0.0000          Min.   :0.0000            Min.   :      0       
##  1st Qu.: 0.0000          1st Qu.:0.0000            1st Qu.:   3085       
##  Median : 0.0000          Median :0.0000            Median :   8338       
##  Mean   : 0.3179          Mean   :0.0147            Mean   :  17879       
##  3rd Qu.: 0.0000          3rd Qu.:0.0000            3rd Qu.:  19457       
##  Max.   :15.0000          Max.   :4.0000            Max.   :1433328       
##  NA's   :66               NA's   :676               NA's   :676           
##  BankcardUtilization AvailableBankcardCredit  TotalTrades    
##  Min.   :0.0000      Min.   :     0          Min.   :  1.00  
##  1st Qu.:0.3100      1st Qu.:   875          1st Qu.: 15.00  
##  Median :0.6100      Median :  4139          Median : 22.00  
##  Mean   :0.5644      Mean   : 11133          Mean   : 23.36  
##  3rd Qu.:0.8400      3rd Qu.: 13017          3rd Qu.: 30.00  
##  Max.   :2.3600      Max.   :292662          Max.   :114.00  
##  NA's   :676         NA's   :672             NA's   :672     
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.0000                     Min.   : 0.0000        
##  1st Qu.:0.8200                     1st Qu.: 0.0000        
##  Median :0.9400                     Median : 0.0000        
##  Mean   :0.8864                     Mean   : 0.8014        
##  3rd Qu.:1.0000                     3rd Qu.: 1.0000        
##  Max.   :1.0000                     Max.   :17.0000        
##  NA's   :672                        NA's   :672            
##  DebtToIncomeRatio         IncomeRange   IncomeVerifiable
##  Min.   : 0.0000   $50,000-74,999:2776   False: 787      
##  1st Qu.: 0.1400   $25,000-49,999:2724   True :9213      
##  Median : 0.2200   $100,000+     :1575                   
##  Mean   : 0.2662   $75,000-99,999:1422                   
##  3rd Qu.: 0.3100   Not displayed : 688                   
##  Max.   :10.0100   $1-24,999     : 664                   
##  NA's   :772       (Other)       : 151                   
##  StatedMonthlyIncome                    LoanKey     TotalProsperLoans
##  Min.   :     0      09303699897852595CD59DD:   2   Min.   :1.000    
##  1st Qu.:  3250      5E9337054508165362CD556:   2   1st Qu.:1.000    
##  Median :  4667      64A8370161790336267B379:   2   Median :1.000    
##  Mean   :  5611      F86637075491079348E0575:   2   Mean   :1.459    
##  3rd Qu.:  6833      000537001363220451EA011:   1   3rd Qu.:2.000    
##  Max.   :140417      00093662314540397D8EFEA:   1   Max.   :7.000    
##                      (Other)                :9990   NA's   :8062     
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.0        
##  1st Qu.:  9.00             1st Qu.:  9.0        
##  Median : 16.00             Median : 16.0        
##  Mean   : 23.38             Mean   : 22.7        
##  3rd Qu.: 34.00             3rd Qu.: 32.0        
##  Max.   :141.00             Max.   :141.0        
##  NA's   :8062               NA's   :8062         
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.000                      Min.   :0.000                  
##  1st Qu.: 0.000                      1st Qu.:0.000                  
##  Median : 0.000                      Median :0.000                  
##  Mean   : 0.628                      Mean   :0.056                  
##  3rd Qu.: 0.000                      3rd Qu.:0.000                  
##  Max.   :33.000                      Max.   :8.000                  
##  NA's   :8062                        NA's   :8062                   
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   : 1000            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6100            Median : 1741              
##  Mean   : 8671            Mean   : 2932              
##  3rd Qu.:11364            3rd Qu.: 4234              
##  Max.   :56494            Max.   :23034              
##  NA's   :8062             NA's   :8062               
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-167.000            Min.   :   0.0           
##  1st Qu.: -35.000            1st Qu.:   0.0           
##  Median :  -2.000            Median :   0.0           
##  Mean   :  -5.021            Mean   : 156.5           
##  3rd Qu.:  20.000            3rd Qu.:   0.0           
##  Max.   : 214.000            Max.   :2703.0           
##  NA's   :8335                                         
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   : 0.00              Min.   :    16  
##  1st Qu.:10.00                 1st Qu.: 6.00              1st Qu.: 37510  
##  Median :15.00                 Median :21.00              Median : 67989  
##  Mean   :16.47                 Mean   :32.04              Mean   : 69024  
##  3rd Qu.:22.00                 3rd Qu.:65.00              3rd Qu.:101087  
##  Max.   :40.00                 Max.   :99.00              Max.   :136378  
##  NA's   :8503                                                             
##  LoanOriginalAmount          LoanOriginationDate LoanOriginationQuarter
##  Min.   : 1000      2013-11-13 00:00:00:  41     Q4 2013:1241          
##  1st Qu.: 3890      2013-10-16 00:00:00:  38     Q1 2014:1011          
##  Median : 6010      2014-01-22 00:00:00:  38     Q3 2013: 829          
##  Mean   : 8303      2014-01-14 00:00:00:  33     Q2 2013: 658          
##  3rd Qu.:12000      2014-02-19 00:00:00:  33     Q3 2012: 498          
##  Max.   :35000      2013-09-24 00:00:00:  30     Q2 2012: 464          
##                     (Other)            :9787     (Other):5299          
##                    MemberKey    MonthlyLoanPayment LP_CustomerPayments
##  C70934206057523078260C7:   4   Min.   :   0.0     Min.   :    0      
##  077435186242217874F4D0B:   3   1st Qu.: 130.3     1st Qu.: 1045      
##  5BAA3507676872666AA5774:   3   Median : 216.4     Median : 2623      
##  7AA03366669917702D0CF2B:   3   Mean   : 271.1     Mean   : 4175      
##  A94433662517143290CC132:   3   3rd Qu.: 372.3     3rd Qu.: 5494      
##  AE513536468130556EE5F83:   3   Max.   :1808.8     Max.   :40548      
##  (Other)                :9981                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :    0.0    Min.   :-538.46  
##  1st Qu.:  536.1              1st Qu.:  283.2    1st Qu.: -73.19  
##  Median : 1637.2              Median :  709.1    Median : -34.83  
##  Mean   : 3082.4              Mean   : 1092.2    Mean   : -55.30  
##  3rd Qu.: 4000.0              3rd Qu.: 1467.4    3rd Qu.: -14.19  
##  Max.   :35000.0              Max.   :15547.7    Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-6221.32   Min.   :    0.0       Min.   : -269.2    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -15.15   Mean   :  745.5       Mean   :  726.7    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :24317.9       Max.   :24317.9    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations  
##  Min.   :    0.00                Min.   :0.7014   Min.   : 0.0000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.0000  
##  Median :    0.00                Median :1.0000   Median : 0.0000  
##  Mean   :   26.95                Mean   :0.9985   Mean   : 0.0473  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.0000  
##  Max.   :11857.11                Max.   :1.0000   Max.   :24.0000  
##                                                                    
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   :0.000              Min.   :    0.00            Min.   :   1.00  
##  1st Qu.:0.000              1st Qu.:    0.00            1st Qu.:   2.00  
##  Median :0.000              Median :    0.00            Median :  44.00  
##  Mean   :0.019              Mean   :   13.07            Mean   :  81.99  
##  3rd Qu.:0.000              3rd Qu.:    0.00            3rd Qu.: 118.00  
##  Max.   :5.000              Max.   :10000.00            Max.   :1189.00  
## 

We will look at doing some simple data cleaning so as to facilitate the examination of the data.

Looking at the data, we note that ClosedDate is a factor and we will convert it into a date before splitting it into year, month and day.

First we convert the date into standard date time format then we use the separate function to create new variables based off the ClosedDate

Next we look to arrange the IncomeRange and LoanOriginationQuarter factors in a more intuitive order

Univariate

Looking at the distribution of Credit Grade of borrowers, we noticed that there seems to be a significant amount of borrowers who are not credit graded, stripping those borrowers away, we can see that most of the borrowers with Credit Grade are in C or D

It becomes clear why CreditGrade has significiant amount of blanks from the following graph. It seems like after Q4 2008, all the borrowers are no longer credit graded. This is verified via the second plot. Thus, there seems to be a structural change in the lending model post Q4 2008.

Given that there is a structural change in the lending model as applicants post Q4 2008 are no longer Credit Graded by the same measure, can explore and see if the model used to evaluate borrowers pre Q4 2008 and post Q4 2008 are significantly different and which is better.

We will include a new binary variable in the data set so as to show if the data is pre-2008 and post-2008

Credit quality can also be measured using percentage delinquency of original loan amount.

Removing all the individuals that have no delinquent amounts, we note that most of the delinquencies are less than 25% of the original loan amount. In fact, after that, the number of people drop off significantly.

A simple function to create bar plots with a given variable and data set.

create_bar <- function(dataset, 
                       variable1, 
                       xbreaks=1,
                       ylower=1,
                       yupper=1, 
                       standardplot){
  if (standardplot) {
    return(ggplot(aes_string(x = variable1), 
                  data = dataset) + 
             geom_bar() +
             theme(axis.text.x = element_text(angle=90)))
  } else {
  minx <- round(min(dataset[,variable1],na.rm=TRUE), digits = 1)
  maxx <- round(max(dataset[,variable1],na.rm=TRUE), digits = 2)
  return(ggplot(aes_string(x = variable1), 
                data = dataset) + 
           geom_bar() + 
           scale_x_continuous(breaks=seq(0,maxx,xbreaks)) +
           coord_cartesian(ylim=c(ylower,yupper)))
  }
}

The year 2013 seems to be the peak year of borrowing with a sharp fall in 2014

We see higher loan closures in Feburary with a second spike in September. There tends to be an increase in loan closures towards the end of the quarter

22nd seems to be the most popular day for closure of loans.

Most borrowers are in the income range $25,000-49,999 and $50,000-74,999.

Majority of the incomes are verifiable

Majority of loans have no more than 300 investors.

BorrowerAPR generally starts concentrate between 10% to 30% with a dual peak at 15% and 23% regions.

Similarly, LenderYield generally starts concentrate between 5% to 30% with a dual peak at 15%, 23% and 30% regions.

California has the highest number of loan applicants followed by Florida and Texas.

Half of the sample borrowers are homeowners.

On average, borrowers have about 8 open credit lines.

Bank Card Utilization is very high and count increases with Bank card utilisation.

Highest borrower counts in Q4 2013 followed by Q1 2014 and Q3 2013.

Looking at the distribution of investors over both pre-2008 and post-2008, we noticed that they look rather similar.

For ease of comparison, I have scaled both sample sets using the same axis where we see only counts up to a 100 and a max of 800 investors.

Main Takeaways

  1. There is a structural change in credit lending standards in end of 2008.
  2. Most delinquencies are less than 25% of original amount.
  3. Most of the borrowers with Credit Grade are in C or D pre-2008
  4. Most borrowers are in the income range $25,000-74,999.
  5. Investors count distribution looks generally the same (both pre-2008 and post-2008)
  6. Average borrower has 8 credit lines.

Bivariate

Giving ourselves a general overview of the data to see if there are interesting relationships that we can explore in the data set, we will use ggpairs to plot out all the various relationships between variables.

As expected, we see strong correlation between what is the rate charged to the borrower and what is earned by the lender (after taking into account losses) and the picture is very much the same across the different states.

This is further confirmed but the correlation plot of LenderYield vs BorrowerAPR across each state

A1 <- loan_sample %>% 
  group_by(BorrowerState) %>% 
  mutate(COR = cor(LenderYield, BorrowerAPR))

ggplot(A1, aes( x=BorrowerState, y=COR ) ) + 
               geom_point(stat = "identity" ) +
          theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 497 rows containing missing values (geom_point).

Plotting the number of applications at each ClosingDay and breaking it down to each month, we notice something very interesting. The ClosedDay seems to spike at the start of the week and tend to taper off towards the end of week. Also, there tend to be spikes towards the end of the month.

Something interesting occured when we plot the bar graph of income range but facet on Credit Grade (excluding the post-2008 samples). The general distribution of income for different CreditGrade looks very similar with the peak coming at about $25,000-50,000 income range.

Furthermore,

  1. Income range seems to have minimal impact on the CreditGrade of an individual. This is because there is an absence of skew across the different CreditGrade

  2. There is a observably lower number of borrowers in the “$1-24,999” category as compared to the “$50,000-74,999” category which is slightly counterintuitive considering the trend from $25,000 to $100,000+

However, putting it beside that of post-2008, there seems to be obvious differences in the income profile of the borrowers.

  1. There seems to be better data collection with income with individuals reporting as “Not displayed” altogether disappearing

  2. More importantly, there is a slightly change in the income profile of the borrowers with higher income applicants utilising the credit facility more post 2008.

To confirm that the income we are using is reliable, we will take a look and make sure majority of the income is verified

We will look to remove the subset of incomes that cannot be verified

loan_sample_pre1<-subset(loan_sample_pre, 
                         loan_sample_pre$IncomeVerifiable=='True')
loan_sample_post1<-subset(loan_sample_post, 
                          loan_sample_post$IncomeVerifiable=='True')

Percentage of deliquent amount varies with income range and from the next plot, we do note that the shape of the histograms are rather similar.

We assume that BorrowerAPR and number of investors are independently determined. BorrowerAPR is determined strictly by the lending company and number of investors who are given a second look and decide what and where is the best to invest their funds independent of the BorrowerAPR the lending company charges (I.e. even if lending company charges a super low APR, if the borrower is considered risky by the investors, there will be very few investors)

From the scatter plot below, we note there seems to be a general trend in where more risky borrowers (as measured by their BorrowerAPR) gets less investors.

The income group of $25,000 to $74,999 are the main reasons why the average borrower have about 8 open credit lines whilst higher income individuals have slightly more open credit lines and lower income individuals have less.

There seems to be a relationship between number of open credit lines and income range.

Main Takeaways:

  1. Most income are verified.
  1. There is better data collection post 2008 where there are much less Not displayed income.
  1. Borrowers have a slightly different income profile n pre-2008 and post-2008.
  1. Pre-2008 most borrowers are in the “$25,000-49,999” or Not displayed income range.
  2. Post-2008 most borrowers are in the “$50,000-74,999” income range
  1. More risky borrowers (as measured by their BorrowerAPR) gets less investors.
  2. Shape of the histograms of percentage of delinquent amount looks similar across different income groups.
  3. Income seems to have minimal impact on the CreditGrade of an individual.
  4. The ClosedDay seems to spike at the start of the week and tend to taper off towards the end of week.
  1. Closures also tend to spike towards the end of the month.
  1. There is a strong correlation between BorrowerAPR and LenderYield.
  1. The picture is the same across all the different states
  1. The higher the income of an individual, the more likely there will be more open credit lines.

Multivariate Plots

There is generally more Open Credit lines for individuals post 2008 bar the top income group.

## loan_sample_pre$IncomeRange: Not displayed
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   4.750   6.500   6.917  10.000  14.000     673 
## -------------------------------------------------------- 
## loan_sample_pre$IncomeRange: Not employed
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   6.000   8.667   8.500  35.000 
## -------------------------------------------------------- 
## loan_sample_pre$IncomeRange: $0
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   4.000   6.500   7.446  10.000  26.000 
## -------------------------------------------------------- 
## loan_sample_pre$IncomeRange: $1-24,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   3.000   5.000   5.368   7.000  14.000 
## -------------------------------------------------------- 
## loan_sample_pre$IncomeRange: $25,000-49,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   4.000   7.000   7.124   9.000  43.000 
## -------------------------------------------------------- 
## loan_sample_pre$IncomeRange: $50,000-74,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   5.000   8.000   9.007  12.000  29.000 
## -------------------------------------------------------- 
## loan_sample_pre$IncomeRange: $75,000-99,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   6.000   9.000   9.924  12.000  32.000 
## -------------------------------------------------------- 
## loan_sample_pre$IncomeRange: $100,000+
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    8.00   12.00   12.76   17.00   51.00
## loan_sample_post$IncomeRange: Not displayed
## NULL
## -------------------------------------------------------- 
## loan_sample_post$IncomeRange: Not employed
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   4.000   6.000   7.915  11.000  25.000 
## -------------------------------------------------------- 
## loan_sample_post$IncomeRange: $0
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   5.000  10.000   8.222  11.000  16.000 
## -------------------------------------------------------- 
## loan_sample_post$IncomeRange: $1-24,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   3.250   6.000   6.249   8.000  21.000 
## -------------------------------------------------------- 
## loan_sample_post$IncomeRange: $25,000-49,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   5.000   8.000   8.214  11.000  34.000 
## -------------------------------------------------------- 
## loan_sample_post$IncomeRange: $50,000-74,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   6.000   9.000   9.481  12.000  34.000 
## -------------------------------------------------------- 
## loan_sample_post$IncomeRange: $75,000-99,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    7.00    9.00   10.06   13.00   36.00 
## -------------------------------------------------------- 
## loan_sample_post$IncomeRange: $100,000+
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    8.00   11.00   11.49   15.00   38.00

Another way to look at the same data, thorugh boxplots of displaced data so as to be able to see the outliers better

To understand this phenomenon of higher income individuals applying for the loan facility much more frequently, we look the following 2 boxplots. The amount of credit in the accounts are actually LOWER for the top 2 income groups post-2008 but on the contrary, it seems to be higher in the next 2 income groups ($25,000-49,999 and $50,000-74,999)

Though there are more open credit lines and applications for higher income individuals, the amount of credit available for them seem to have decreased

We have established that the average amount of credit available to the top 2 income ranges has decreased whilst that of the next 2 income range has increased. Looking at the amount of utilization, it seems like the amount of utilisation has not changed much for the top income brackets but has decreased for that of the next 3 income groups.
An altruistic individual might claim that borrowers in the lower income ranges has become more “discipline” in their utilisation of the loan facility post 2008 given the reduction in the utilisation of Bank card and supplemented by the fact that lenders have increased the amount of credit available to them.

To see if the statement above is true, we measure how discipline one is with his finances by his debt to income ratio and interestingly, it does not seem like thats the case. The mean DebtToIncomeRatio for almost all income groups have risen post 2008.

In fact, it seems like indivduals in the lower income ranges has become “less discipline”.

## loan_sample_pre1$IncomeRange: Not displayed
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.0900  0.1600  0.2092  0.2500  5.5600       4 
## -------------------------------------------------------- 
## loan_sample_pre1$IncomeRange: Not employed
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.09    0.14    0.19    0.19    0.24    0.29       5 
## -------------------------------------------------------- 
## loan_sample_pre1$IncomeRange: $0
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##      NA      NA      NA     NaN      NA      NA       4 
## -------------------------------------------------------- 
## loan_sample_pre1$IncomeRange: $1-24,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0300  0.1500  0.2700  0.4893  0.4300 10.0100 
## -------------------------------------------------------- 
## loan_sample_pre1$IncomeRange: $25,000-49,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0100  0.1450  0.2400  0.2599  0.3400  0.9200 
## -------------------------------------------------------- 
## loan_sample_pre1$IncomeRange: $50,000-74,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0100  0.1300  0.2000  0.2235  0.3000  0.8500 
## -------------------------------------------------------- 
## loan_sample_pre1$IncomeRange: $75,000-99,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0100  0.1100  0.1650  0.1854  0.2425  0.5900 
## -------------------------------------------------------- 
## loan_sample_pre1$IncomeRange: $100,000+
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0100  0.0900  0.1500  0.1686  0.2400  0.4200
## loan_sample_post1$IncomeRange: Not displayed
## NULL
## -------------------------------------------------------- 
## loan_sample_post1$IncomeRange: Not employed
## NULL
## -------------------------------------------------------- 
## loan_sample_post1$IncomeRange: $0
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##      NA      NA      NA     NaN      NA      NA       1 
## -------------------------------------------------------- 
## loan_sample_post1$IncomeRange: $1-24,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0500  0.2000  0.3400  0.5109  0.5300 10.0100 
## -------------------------------------------------------- 
## loan_sample_post1$IncomeRange: $25,000-49,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0100  0.1800  0.2700  0.2873  0.3700  1.5100 
## -------------------------------------------------------- 
## loan_sample_post1$IncomeRange: $50,000-74,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.020   0.160   0.230   0.238   0.310   1.210 
## -------------------------------------------------------- 
## loan_sample_post1$IncomeRange: $75,000-99,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0100  0.1300  0.2000  0.2076  0.2700  0.7800 
## -------------------------------------------------------- 
## loan_sample_post1$IncomeRange: $100,000+
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0100  0.1200  0.1700  0.1768  0.2300  0.5300

The following scatter plot show that homeowners tend to have more revolving credit balance on their account on average and also tend to be of higher income.

This is confirmed by the boxplot next to it

A question that came to mind is that do homeowners tend to have more days delinquent given that they tend have higher revolving credit balances. From the plot below, it seems homeowners do seem more “responsible” and capable to service their debt with a lower average days of delinquency although the difference is surprisingly small (about a month)

The plot below shows Debt to Income Ratio against that of percentage of delinquent amount of different income ranges.

One thing that seems to stand out massively is that borrowers in the $1-24,999 seems to be the most vulnerable group bar the other 3 groups that we have minimal data in. Their debt to income ratio seems to be highly correlated to their percentage of delinquent amount. This issue is much less pronounced in the higher income group.

Final Plots and Summary

The following 3 plots were chosen as the final plots as they explore and answer 3 interesting questions to me

Plot 1

Question

Is the wisdom of the crowd better than the ex-ante rate that the lender associate with the borrower (measured by BorrowerAPR)?

Analysis, thoughts and answer

To make the behaviour of the data clearer, we will include a new variable where we plot the smoothed lines of max investors for each level of Borrower APR and splitting the scatter plot above into 4 quadrants gives us different intuition to how well or badly the 2 measures perform as seen by the intensity of red in each quadrant

  1. Top left quadrant (being intense red) - Wisdom of crowd fails and lending company fails. Both the crowd and the lending company failed to identify delinquent borrowers

  2. Bottom left quadrant (being intense red) - Wisdom of crowd succeeds where lending company fails. The crowd successfully “dodged the delinquency/principal loss bullet” unlike the company

  3. Top right quadrant (being intense red) - Wisdom of crowd fails while lending company success. The lending company outperforms the masses.

  4. Bottom right quandrant (being intense red) - Wisdom of crowd succeeds and lending company succeeds. Both the crowd successfully “dodged the delinquency/principal loss bullet”

When it comes to NetPrincipal loss, it seems that lending company in general are seem to be less successful than the investors. The concentration of red seems to be concentrated in the left half of the graph where it seems that the lending company has wrongly evaluated the individuals by giving them a low rate but in the end suffered significant lossses. Investors are not much better but at least the scatter seems more uniform (i.e. random chance)

However when it comes to delinquencies, the wisdom of the crowd seems to be significantly outperform. The intensity of the red is concentrated in the lower 2 quadrants of the graph and it seems randomly scattered across the BorrowerAPR. Seems to suggest even if lenders randomly assign a lending rate, the delinquency will not look too different (which is pretty damning).

Plot 2

Question

Given the change in the lending standards post 2008, is there a significant improvement in the identification of the high risk borrowers

Analysis, thoughts and answer

Using Borrower’s APR as a proxy of how lender’s view a borrower (ex ante) and principal loss as a measure of realisation (ex post), we plot the graph of Pre-2008 and Post-2008. Straightaway, there is a significant difference in the graph.

  1. Lenders are a lot more cautious post-2008 with the 99th percentile principal losses being 1.83 times higher in pre-2008 than post-2008 (as seen by the horizontal black line)

  2. Principal losses due to “high risk borrowers” also seem to be “capped” post-2008 at about 15,000. Perhaps this is the result of stricter borrowing standards.

  3. The mean Lender yield has marginally increased by 5% (to about 25%) post-2008 but remains roughly the same.

  4. Lenders are still surprisingly bad at assessing “low risk borrowers” as we can see from the top left quandrant (lenders view borrowers in the top left quandrant as less risky than the average borrower but suffered a loss >99% percentile of net principal loss) where the percentages have increased post-2008.

##     99% 
## 1.66164
## [1] -0.0384003

Plot 3

Question

Why are the number of borrowers from the higher income group increasing but the amount of credit available from the lower income group increasing?

Analysis, thoughts and answer

A cynical (or some say worldly) claim might be that lenders find that it is just not profitable lending to high income individuals and they are a drag on margins and hence reduced the amount of credit available to them whilst corresponding deploying the freed up resources to lower income individuals

This claim seems to hold more water

The top two boxplots shows observable yield differences the lender received from individuals from different income group especially post 2008.

Given that the post-2008 sample size is about twice that of the pre-2008 sample size, I reduced the alpha by half for pre-2008 samples to make the color intensity “equivalent”. With that in mind, the bottom two scatter plots of loan delinquency seems show something interesting.

First and foremost, lenders have been a lot better at managing delinquency (the number of deliquent days have decreased by almost 5 times) but more importantly, the relative intensity of the color of the 2 groups ($25,000-49,999 and $50,000-74,999) against the rest have increased visibly post-2008 . This together with a higher LenderYield seems to point to the fact that these groups have become much more profitable for the lenders (assuming no defaults)

This seems to suggest, higher income individuals are applying more frequently as they are “less likely” to be granted sufficient credit balances for their purposes and the resources have been diverted to the more profitable lower income borrowers coupled with a a significant improvement of deliquency management by the lenders.

## loan_sample_pre1$IncomeRange: Not displayed
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0450  0.1265  0.1750  0.1799  0.2340  0.4325 
## -------------------------------------------------------- 
## loan_sample_pre1$IncomeRange: Not employed
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.03000 0.07480 0.10500 0.09849 0.12990 0.14500 
## -------------------------------------------------------- 
## loan_sample_pre1$IncomeRange: $0
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1110  0.1478  0.2139  0.2197  0.2858  0.3400 
## -------------------------------------------------------- 
## loan_sample_pre1$IncomeRange: $1-24,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0400  0.1250  0.1690  0.1820  0.2248  0.3400 
## -------------------------------------------------------- 
## loan_sample_pre1$IncomeRange: $25,000-49,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1192  0.1595  0.1759  0.2268  0.3400 
## -------------------------------------------------------- 
## loan_sample_pre1$IncomeRange: $50,000-74,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0395  0.1140  0.1520  0.1668  0.2060  0.3400 
## -------------------------------------------------------- 
## loan_sample_pre1$IncomeRange: $75,000-99,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0295  0.0950  0.1478  0.1569  0.2025  0.3400 
## -------------------------------------------------------- 
## loan_sample_pre1$IncomeRange: $100,000+
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0400  0.0950  0.1344  0.1495  0.1895  0.3400
## loan_sample_post1$IncomeRange: Not displayed
## NULL
## -------------------------------------------------------- 
## loan_sample_post1$IncomeRange: Not employed
## NULL
## -------------------------------------------------------- 
## loan_sample_post1$IncomeRange: $0
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.285   0.285   0.285   0.285   0.285   0.285 
## -------------------------------------------------------- 
## loan_sample_post1$IncomeRange: $1-24,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0510  0.1812  0.2396  0.2292  0.3025  0.3400 
## -------------------------------------------------------- 
## loan_sample_post1$IncomeRange: $25,000-49,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0399  0.1501  0.2099  0.2071  0.2649  0.3400 
## -------------------------------------------------------- 
## loan_sample_post1$IncomeRange: $50,000-74,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0465  0.1299  0.1799  0.1865  0.2449  0.3400 
## -------------------------------------------------------- 
## loan_sample_post1$IncomeRange: $75,000-99,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0450  0.1199  0.1747  0.1796  0.2387  0.3400 
## -------------------------------------------------------- 
## loan_sample_post1$IncomeRange: $100,000+
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0399  0.1100  0.1579  0.1655  0.2118  0.3400

Reflections

There are various things in the data set that I found extremely interesting and with more data will certainly be useful.

For example, I am quite curious why richer people are so much more willing to tap into credit lines post 2008? Is there a reduction in social stigma? Is it because pay is not keeping in pace with inflation? Or is it because they have received “promotional deals” such as it becomes cheaper for them to tap into credit lines for investment purposes? These are various things that can be cross referenced against a different snapshot or an entirely different data set for us to gain more insight into the behaviour of the consumer. In this way, we can actually tailor a more optimal product to each income group and also, prevent unsuitable and potentially litigatious product from being released to segments of the population that are ill-suited for it

I find the structural changes in the data extremely interesting and there are various analysis that can be done on the sample such as examining the “cap” on BorrowerAPR we seem to be witnessing in the last plot. Has the “cap” encouraged or “discouraged” lending/ borrowings? How does it have implications on deliquencies? This helps in policy analysis. Is the Lender yield cap of 30% achieving the spirit of what it set out to achieve? Is it leading to exploitation or any sort of unintended consequences?

The main difficulty encountered comes from incomplete dataset. Sometimes, it is not immediately clear if the data has a structural feature that is resulting in data missing or were they missing because of bad data collection. This makes it extremely tricky to look at subsets of data that excludes missing observations as we might be missing a trend or in the worst case, introducing structural flaws into the data.